---
title: "Profitability-Identifying profit-influencing categories using stepwise regression (PRECOVID and COVID periods)"
author: "Olanrewaju Olagunju"
date: "2023-03-11"
output:
  html_document: default
  pdf_document: default
---

<details>
  <summary>REQUIRED STEPS AND DATA PREPARATION (Click here)</summary>

## STEPS TO PREPARE DATA FOR THE ANALYSIS AND RUN THE MODEL
1. Filter the data to specific period
2. Remove unneeded column
3. Convert categorical variables to numeric/dummies
4. Select needed variable from the new dataframe for the model
5. Run the regression model
6. Use the model in stepwise regression (forward OR backward)


## Know your working directory

```{r, results='hide'}
getwd()
```

# LOAD DATA ----
```{r}
CorRegdat <- read.csv("Rel.data4.csv")
```

```{r}
names(CorRegdat)
```
# INSTALL PACKAGES ----

```{r}
#install.packages("tidyverse")
#install.packages("corrplot")
#install.packages("caret")
#install.packages("polycor")
```
## LOAD PACKAGE
```{r}
library("tidyverse")
```

## PREPARE DATA FOR PreCOVID PERIOD ANALYSIS
## STEP 1
Using the existing (initial) CorRegdat file, select variables to work within (PreCOVID)

```{r}
CorRegdatPx<- CorRegdat %>% 
             select(T_Efficiency:FingSource
      
) %>% 
filter(ProdPeriod %in% "PreCOVID")
names(CorRegdatPx)
```
```{r}
unique(CorRegdatPx$ProdPeriod )
```

## STEP 2
Remove column "ProdPeriod" since its only one value and other unnecessary columns
```{r}
CorDatP<- CorRegdatPx %>% 
             select(T_Efficiency:FingSource, -State, -AnnualProd,-ProdPeriod, -ProfitStatus)
      

names(CorDatP)
```

## Have a glimpse to see if the data in the variable are still in order
```{r}
#glimpse(CorDatP)
```

## STEP 3
Convert categorical variables to dummy variables using "caret" package
```{r}
library("caret")
```

```{r}
dmyP<- dummyVars("~ .", data = CorDatP)

CorDatPTr<-data.frame(predict(dmyP, newdata = CorDatP))
```
Confirm the number or observations and columns
```{r}
dim(CorDatPTr)
```
```{r}
#head(CorDatPTr)
```

## STEP 4
Select needed variable from the existing data that has been converted from categories to numeric

```{r}
RegDatP <- CorDatPTr %>% 
             select( Profit.kg, ProdScaleLarge:FingSourceOther.farms.or.hatcheries)
names(RegDatP)               
```

# PREPARE DATA FOR COVID PERIOD ANALYSIS

## STEP 1
Using the existing CorRegdat file, select variables to work with (PreCOVID)

```{r}
CorRegdatCx<- CorRegdat %>% 
             select(T_Efficiency:FingSource
      
) %>% 
filter(ProdPeriod %in% "COVID")
names(CorRegdatCx)
```

```{r}
unique(CorRegdatCx$ProdPeriod )
```
## STEP 2
Remove column "ProdPeriod" since its only one value and others not needed
```{r}
CorDatC<- CorRegdatCx %>% 
             select(T_Efficiency:FingSource, -State, -AnnualProd,-ProdPeriod, -ProfitStatus)
      

names(CorDatC)
```
## Have a glimpse to see if the data in the variable are still in order
```{r}
#glimpse(CorDatC)
```

## STEP 3
Create categorical dummies from the new data
```{r}
dmyC<- dummyVars("~ .", data = CorDatC)

CorDatCTr<-data.frame(predict(dmyC, newdata = CorDatC))
```
Confirm the number or observations and columns
```{r}
dim(CorDatCTr)
```
```{r}
#head(CorDatCTr)
```

## STEP 4
Select needed variable from the existing data that has converted categories to numeric

```{r}
RegDatC <- CorDatCTr %>% 
             select( Profit.kg, ProdScaleLarge:FingSourceOther.farms.or.hatcheries)
names(RegDatC)               
```
</details> 

# STEP-WISE REGRESSION OF DATA 

# Running stepwise regression in stats package - PRECOVID

```{r}
library(stats)
```

## STEP 5 - PRECOVID
### Regression model - PRECOVID
Because I have many variable to be included in the model, there are many ways of including them without typing them all out. You can use dot [.] to represent all, or use myData[,2:71] to select the columns you wish to include- 2 to 71 in that case.
```{r}
modelP = lm(Profit.kg ~ ., data = RegDatP)
summary(modelP)
```
## STEP 6 - PRECOVID
### Stepwise regression - BACKWARD - PRECOVID
```{r}
stepwise_modelPb <- step(modelP, direction = "backward", k = log(nrow(RegDatP)))

summary(stepwise_modelPb)
```


# Running stepwise regression in stats package - COVID

```{r}
library(stats)
```
## STEP 5 - COVID
### Regression model- COVID
Because I have many variable to be included in the model, there are many ways of including them without typing them all out. You can use dot [.] to represent all, or use myData[,2:71] to select the columns you wish to include - 2 to 71 in that case.
```{r}
modelC = lm(Profit.kg ~ ., data = RegDatC)
summary(modelC)
```
## STEP 6 - COVID
### Stepwise regression - BACKWARD - COVID
```{r}
stepwise_modelCb <- step(modelC, direction = "backward", k = log(nrow(RegDatC)))

summary(stepwise_modelCb)
```























